Skip to content

DOC: Document that str.match accepts a regular expression #61879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hamdanal
Copy link

Similar to str.fullmatch and other methods that accept regular expressions

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Similar to str.fullmatch and other methods
@@ -1374,7 +1374,7 @@ def match(self, pat: str, case: bool = True, flags: int = 0, na=lib.no_default):
Parameters
----------
pat : str
Character sequence.
Character sequence or regular expression.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by regular expression do you mean a string that is interpreted as a regular expression or a compiled regular expression object?

to avoid confusion, if the former then no doc change probably needed, if the later the type hints in the signature would also need to be updated and some code changes required?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he meant a compiled regular expression, this is how we are trying to type it in the stubs.
I believe we should align all the docs, since it uses the functions of re under the hood the functions below support re.Pattern so compiled regular expression is also accepted at runtime.
If we look at the docs it seems like it is a bit unclear what regular expression means because I would assume it is just a regular string in the for r"...".
So the question is should we allow for compiled regular expression as it is supported at runtime?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the documentation is the official API. If the stubs have been updated to reflect the types that are accepted then this is the tail wagging the dog? If we update the documentation, then we also need to update the type annotations in the code as well as ensure that the behavior is tested?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point and you are correct, I think the confusion originally came from regular expression != compiled regex.
But then I went into the stubs and it seems like we are testing for it:

def test_replace_compiled_regex_mixed_object():
pat = re.compile(r"BAD_*")
ser = Series(
["aBAD", np.nan, "bBAD", True, datetime.today(), "fooBAD", None, 1, 2.0]
)
result = Series(ser).str.replace(pat, "", regex=True)
expected = Series(
["a", np.nan, "b", np.nan, np.nan, "foo", None, np.nan, np.nan], dtype=object
)
tm.assert_series_equal(result, expected)

So the question would be to clarify what do we mean by regular expression, is it compiled or not, and so we can:

  • clarify the docs
  • update the stubs according to allow or not re.Pattern[str]

Please let us know @simonjayhawkins.

@simonjayhawkins simonjayhawkins added Docs API Design API - Consistency Internal Consistency of API/Behavior labels Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior API Design Docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants